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Abstract 

We study the height of the delta peak at in the spectrum of in- 
cidence matrices of random trees. We show that the average fraction 
of the spectrum occupied by the eigenvalue in a large random tree is 
asymptotic to 2x„ - 1 = 0.1342865808195677459999 • • ■ where x* is the 
unique real root of x = e~ x . For finite trees, we give a closed form, 
a generating function, and an asymptotic estimate for the sequence 
{z n )n>i = 1, 0, 3, 8, 135, 1164, 21035 • • • of the total multiplicity of the 
eigenvalue in the set of n n ~ 2 tree incidence matrices of size n. 

1. Introduction. 

By a classical result in graph theory, the number of labeled treesQ on 
n > 1 vertices is n n ~ 2 . We endow the set T n of labeled trees on n > 1 
vertices with the uniform probability, giving weight n 2 ~ n to each tree. 

Each tree in T n comes with its incidence matrix, the n x n matrix 
with entry ij equal to 1 if there is an edge between vertices i and j and 
to else. Each such (symmetric) matrix has n (real) eigenvalues, which 
by definition form the spectrum of the corresponding tree. This leads 
in turn to nn n ~ 2 = n n ~ l eigenvalues counted with multiplicities for T n 
as a whole. In the sequel, we wish concentrate on the multiplicity of 

1 Precise definitions for this and the following terms can be found in Section |[ 
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the eigenvalue 0. Let Z(T) be the multiplicity of the eigenvalue in 
the spectrum of the incidence matrix of the tree T, i.e. the dimension 
of the kernel. For each n > 1, the restriction Z n of Z to T n is a random 
variable. We set z n = z~2reT Z n (T). The expectation of Z n (T) is 
E(Z n ) = z„/n"- 2 . 

To illustrate these definitions, we give a direct counting of Zi, ■ • • ,z± 
in appendix 

Our aim is to prove : 

Theorem 1. Let z n be the total multiplicity of the eigenvalue in the 
spectra of the n n ~ 2 labeled trees on n vertices. Then : 
i) Closed form : 



n n~2 




2<m<n 

ii) Formal power series identity : 

„2 



X 

' nl 

n>l 



+ 2x-xe x = J2^(xe x e- xe T. 



and 



Corollary 2. For large n, E(Z n ) has an asymptotic expansion in pow- 
ers ofl/n, whose first two terms are 



E(Z n ) = (2x* - l)n + f (X * + ^ + 0(l/n) 

[X* T" J- ; 



where x* = 0.5671432904097838729999 • • • is the unique real root of 
x = e~ x . In particular, the average fraction of the spectrum occupied 
by the eigenvalue in a large random tree is asymptotic to 2x* — 1 = 
0.1342865808195677459999 



Remark 3. We do not try to justify here that fluctuations in random 
trees become small when the number of vertices is large. However, it is 
expected that E(Z^) — K(Z n ) 2 grows only linearly with the number of 
vertices, so that in an appropriate sense the fraction of the spectrum 
occupied by the eigenvalue in an infinite random tree is 2x* — 1 with 
probability 1. 
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Remark 4. With the explicit formula above, it is easy to list the first 
terms in the sequence (z n ) n >i, which are 

1, 0, 3, 8, 135, 1164, 21035, 322832, 7040943, 153153620, 4048737099, ■ ■ ■ 

To prove part i) of Theorem]]] we establish a few preparatory lemmas 
of independent interest. Then we prove ii) using Lagrange inversion 
and get Corollary |] with the steepest descent method. 

But first, we need to fix conventions and notations. 

2. Definitions. 

Even if we are interested ultimately only in trees, we shall need 
more general graphs (for instance, forests) in the proofs, so we give 
for the sake of completeness a collection definitions. Most of them are 
standard, and the reader is encouraged to skip this this section and 
come back to it only when needed. The fundamental definition is 

Definition 5. A simple graph G is a pair (V, E) where V is a finite 
set called the set of vertices and E is a subset of V^ 2 ' = { {x, y}, x e 
V, y G V, x 7^ y} called the set of edges. 

Remark 6. The adjective simple refers to the fact that there is at 
most one edge between two vertices and that edges are pairs of distinct 
vertices. As we have no use of more general graphs in the sequel, we 
shall from now on use graph for simple graph. 

Definition 7. If V is empty, then we say that the graph G is empty. 
If {x, y} belongs to E, we say that there is an edge between x and y 
and that x and y are adjacent vertices in G. The vertices adjacent to a 
given vertex x are called the neighbors of x. The number of neighbors 
of a vertex x is called the degree of x. A leaf of G is a vertex of degree 
1. Two edges of G with a common vertex are called adjacent edges 

Definition 8. A labeled graph on n > 1 vertices is a graph with vertex 
set [n] = {1, • • • , n}. The incidence matrix of a labeled graph on n 
vertices is the n x n matrix with entry ij equal to 1 if there is an edge 
between vertices i and j and to else. 

Remark 9. If the graph G has \V\ = n > 1 vertices^, any bijection 
between V and [n] defines a labeled graph. The incidence matrices 



2 For any finite set S , \S\ is the number of elements in S. 
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for different bijections differ only by a permutation of the lines and 
columns. In particular the eigenvalues are independent of the bijec- 
tion. They are real because, by construction, incidence matrices are 
symmetric. 

Definition 10. The spectrum of a graph is the set of eigenvalues 
(counted with multiplicities) of any of the associated incidence matri- 
ces. By convention, the spectrum of the empty graph is empty. 

Definition 11. A subgraph of a graph G = (V,E) is a graph (W, F) 
such that W C V and F C E. An induced subgraph of G is a graph 
(W, F) such that W C V and F = E n . 

Definition 12. We say that two vertices x and x' G V are in the same 
component of G if there is a sequence x = X\, ■ ■ ■ ,x n = x' in V such 
that adjacent terms in the sequence are adjacent in G (taking n = 1 
shows that luckily x and x are in the same component). This gives 
a partition of V. Each component defines an induced subgraph of G 
which is called a connected component of G. Then G can be thought 
of as the disjoint union of its connected components. We say that G is 
connected if it has only one connected component. 

Definition 13. A polygon in a graph G is a sequence Xq,Xi, ■ ■ ■ ,x n , 
n > 3 of vertices such that adjacent terms in the sequence are adjacent 
in G, Xo = x n and distinct. 

Definition 14. A forest is a graph without polygons. A tree is a 
non-empty connected forest. 

Remark 15. Clearly a subgraph of a forest is a forest. The connected 
components of a non empty forest are trees. One shows easily that 
that a tree with n > 2 vertices has at least two leaves. Then a simple 
induction shows that a tree is exactly a connected graph for which the 
number of vertices is 1 plus the number of edges. A classical theorem 
of Cayley states that there are n n ~ 2 labeled trees on n vertices(see for 
instance proposition 5.3.2 in [H). 

3. TWO PREPARATORY LEMMAS. 

The first lemma is a characterization of the dimension of the kernel 
of incidence matrices viewed function on forests. 
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Lemma 16. The function Z which associates to any forest the multi- 
plicity of the eigenvalue in its spectrum is characterized by the fol- 
lowing properties : 

i) The function Z takes the value on 0, the empty forest. 

ii) The function Z takes the value 1 on • , the forest with one vertex. 
Hi) The function Z is additive on disjoint components, i.e. if the 

forest F is the union of two disjoint forests F± and F2 then Z(F) = 
Z{F l ) + Z{F 2 ) 

iv) The function Z is invariant under "leaf removal", i.e. if x is 
a leaf of F, y its (unique) neighbor, V = V\{x,y}, and F' is the 
subforest of F induced by V then Z(F) = Z(F'). 



Remark 17. That the function Z fulfills properties i)-iv) was no 
doubt known decades ago (see for instance section 8.1, Hiickels theory, 
in H). We give a proof, because in the sequel we want to emphasize 
and use the simple fact that these properties characterize the function 
Z. 



Proof of Lemma \TQ. First, we show that the function Z has properties 
i)-iv). In fact, this is true for general graphs (not only forests). Prop- 
erties i) and ii) follow from the definition of Z, property Hi) follows 
from the fact that the incidence matrix can be put in block diagonal 
form, each block corresponding to a connected component. Property 
iv) is only slightly more complicated. With an appropriate labeling of 
the vertices, the incidence matrix M of F can be decomposed as 

1 
M = I 1 N 
*N M' 

where the first line and column are indexed by the leaf x, the second 
line and column is indexed by its neighbor y, N describes the edges 
between this neighbor and V, and M' is the incidence matrix for V. 
Then v = \v 1, t>2, v') is in the kernel of M if and only if 



v 2 = 

Vl = 



•Nv' 



M'v' = -*Nv 2 . 

So v 2 = which reported in the third equation gives M'v' = implying 
that v' is in the kernel of M', and then the second equation just tunes 
vi the appropriate value. So the kernels of M and M' have the same 
dimension. This proves iv). 
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Now, any tree with more than 1 vertex has leaves, so leaf removal 
as defined in iv) allows to reduce the forest F to a (possibly empty) 
family of isolated vertices (all connected components have only one 
vertex). Hence, there is at most one function, namely Z, that can 
satisfy properties i)-iv). 

Remark 18. Leaf removal and additivity give an efficient algorithm to 
compute the multiplicity of the eigenvalue for a given forest, especially 
when this forest is given as a drawing. 

The next lemma gives a practically awful but theoretically useful 
formula for the function Z. 

Lemma 19. Let L be the function on forests defined by 
%') The function L takes value on 0, the empty forest, 
ii ') The function L takes value 1 on • , the forest with one vertex. 
Hi 7 ) The function L takes value on disconnected forests, 
iv') The function L takes value 2(— on trees withn > 2 vertices. 
Then, for any forest F 

Z(F) = L{F>) = £ L(T') 

F'cF T'cF 

where the first sum is over induced subforests of F , and the second over 
induced subtrees of F. 

Remark 20. For a given forest, there is a much nicer formula, directly 
connected to the geometry of the forest (again, see for instance section 
8.1, Hiickels theory, in Q). In fact, let Q(F) be the maximum among 
the cardinals of sets of pairwise non-adjacent edges in F, and N(F) be 
the number of vertices in F. Then Z(F) = N(F) — 2Q(F). It is easy 
to show that N(F) — 2Q(F) satisfies properties i)-iv) of Lemma [16|. 
In particular, a possible way to maximize the number of non-adjacent 
edges in F in the situation iv) is to do so on F' and add the edge {x, y}. 
Anyway, this explicit formula allows us to restate our theorems in terms 
of the random variable Q n , the restriction of Q to T n . For instance, in 
a large random tree on n vertices, one can find about (1 — x*)n pairwise 
non-adjacent edges. Note that 1 -x* = 0.4328567095902161270000 • • • 
is not much smaller than 0.5 (the upper bound for Q(T)/N(T) for a 
given tree because Z(T) = N(T) — 2Q(T) is always nonnegative). 



Proof of Lemma [H]. Our strategy is to use the characterization of Z 
in Lemma 16. First, we observe that the second equality is a trivial 
consequence of i') and Hi'). We define a new function Z' on the set of 
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forests by 

Z\F) = £ L{T) 



T'cF 



(where the sum is over induced subtrees of F) and show that Z' satisfies 
properties i)-iv) of Lemma (IE. 



As the empty forest has no non-empty induced subtree %') implies i). 

In the same vein, the forest with one vertex as only one non-empty 
induced subtree, namely itself, so ii') implies ii). 

If the forest F is the union of two disjoint forests Fx and F2, an 
induced subtree of F is either an induced subtree of F\ or an induced 
subtree of F2, and the sum defining Z'(F) splits as Z'{F\) + Z'iF-z), 
showing that Z' satisfies property Hi). 

Now, if x is a leaf of F and y its neighbor, we define V = V\{x, y}, 
V" = {x, y} and consider F' and F", the subforests of F induced 
by V and V" respectively. We split the sum defining Z'(F) in three 
pieces. The first is over the induced subtrees of F'. This is just the 
sum defining Z'(F'). The second is over the induced subtrees of F", 
which is a tree on two vertices. Its subtrees are itself, with weight 
L(F") = 2(— 1) 2_1 = —2, and two trees with one vertex, each with 
weight L(') = 1, so this second sum gives 0. The third sum is over 
induced subtrees that have vertices in both V and V" . If this sum is not 
empty, every tree that appears in it has y as a vertex (by connectivity) 
and has at least two vertices (because the tree consisting of y alone 
has already been counted). Then we can group these trees in pairs, a 
tree containing x being paired with the same tree but with x and the 
edge {x, y} deleted. The function L takes opposite values on the two 
members of a pair, so the third sum contributes 0. Hence Z' satisfies 
property iV). So Z'(F) = Z'(F'). 

Remark 21. These two lemmas have an obvious extension to bicolored 
forests. If we use black and white as colors, and count the zero eigen- 
vectors having value zero on white vertices, we only need to replace ii) 



in Lemma [16] by 

ii) The function Z takes value 1 on •, the forest with one vertex 
colored in black and on o, the forest with one vertex colored in white. 



and ii ') and Hi ') in Lemma [19] by 

ii') The function L takes value 1 on •, the forest with one vertex 
colored in black and on o ; the forest with one vertex colored in white. 

Hi') The function I takes value (— l) n_1 on trees with n > 2 vertices. 

The proofs remain the same. 
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Remark 22. The formula 

Z(F) = J2 L(F') 

F'CF 

can be inverted using inclusion-exclusion to give 

L ( F ) = J2(-l) lV(F)HnF ' )l Z(F'). 

F'CF 

This identity has an application in random graph theory 0, which is 
why we got interested in Lemma [19] in the first place. 



4. Main proofs. 
We have now the necessary tools to prove theorem |T]. 
Proof of Theorem |7[ By Lemma 

n T'CT 
TeT n m=l T€T n T'€T m 

As the function L depends only on the number of vertices, for fixed m 
the double sum X]t£T„ X^T'gTm * s simply a multiplicity. We count this 
multiplicity as follows : we remove from T the edges of T", so we are left 
with m trees, each with a special vertex, the one belonging to T' . This 
is by definition what is called a planted forest (or rooted forest) with n 
vertices and m trees. The number of such objects is m(^)ra™~ m_1 (see 
for instance proposition 5.3.2 in N3|). Conversely, starting from such a 
planted forest with m trees (each with a special vertex) and n vertices, 
we can build a tree on the special vertices in m m ~ 2 ways. So 



T'CT 

E E 1 

Ter n T'er m 



71 

m— 1 / | ^n—m—1 



m I I n 



Hence summation over m gives 

-'-2 E (- 1 



K- ~ - l > [-ll m n n - m - 1 m m -* 



2<m<n 



Simple rearrangements lead to the two equivalent formulae in i), the 
first one making clear that z n is an integer. 

To obtain the generating function in ii), we need a mild extension 
of the Lagrange inversion formula (see for instance section 5.4 in ||), 



which states that if f(x) is a formal power series in x starting as f{x) 
x + 0(x 2 ) and g(x) is an arbitrary formal power series in x, 

x n g'(x) 

n 



n>l 



/(*) 



n-1 



where is by definition the k th coefficient of the formal power 

series h(v). 

As an immediate application, we see that if t = xe x then 



x = yj<— /" 

m>l 



,m— 1 



ml 



and 



72= 



-m 



\m-2 



m>l 



m! 



Now we introduce y = te t and define a sequence 0^, n > 1 by 
x* + 2x-xe* = Y,z'n- v 



n>l 



but instead of applying directly the Lagrange inversion formula to 
y = xe x e~ X£X , we first substitute the t-expansion (already obtained 
by Lagrange inversion) on the left-hand side which yields 



m>l 



-m) m - 2 — -t, 
ml 



and then apply Lagrange inversion with y — te 1 . The result is 



n 

n! 



1 

n 



e nt I 1 



m>2 



(m- 1)! 



m— 1 



n-1 



Straightforward expansion of this formula shows that z' n = z n , and this 
proves the generating function representation in ii). 

Remark 23. The derivation of ii) is quite artificial. It turns out 
that random graph theory gives a natural proof || using the formula 
mentioned in remark 22. 



Proof of Corollary 0. This time we use Lagrange inversion with y 

itationfl So 

[l+x){2-e% 



in a contour integral representation^. So 

Ziyi 1 f* diOC 



n 



ixe^e 



x c—xe^ \n 



3 We include the factor ^7 in the symbol § . 
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where the contour is a small anticlockwise-oriented circle around the 
origin. For large n we use the steepest descent method to obtain the 
asymptotic expansion of z n . As -^xe x e~ xeX = (1 + x)(l — xe x )e x e~ xeX , 
the saddle points of xe x e~ X£X are x — — 1 and the solutions to x — e~ x . 
This equation has a unique real root, x*, which is positive. Numerically, 
x* = 0.5671432904097838729999 • • • . On the other hand, x = e~ x has 
an infinite number of complex solutions, coming in complex conjugate 
pairs. Asymptotically, the imaginary parts of these zeroes are evenly 
spaced by about 2tt, while their real parts are negative and grow loga- 
rithmically in absolute value. Consideration of the landscape produced 
by the modulus of the function xe x e~ xeX shows that the small circle 
around the origin can be deformed to give the union of two steepest 
descent curves, one passing through x — — 1 and the other through 
x = x*. These two curves are asymptotic to the two lines y = ±n 
at x — > +oo. Hence, despite the fact that the value of xe x e~ xeX is 
the same, namely 1/e, at all the complex saddle points and at x*, the 
complex saddle points do not contribute to the asymptotic expansion 
of z n at large n. Moreover, the point x = — 1 only gives subdominant 
contributions because — e~ 1 e e is larger than 1/e in absolute value. So 
we concentrate on the asymptotic expansion around x*. As 

\ogxe x e~ xeX = -1 - + ^ (x - x*) 2 + 0((x - x*) 3 ) 

we infer that 

has an asymptotic expansion in powers of 1/n. Hence, by use of Stir- 
ling's formula for n\, we conclude that E(Z n ) = z n /n n ~ 2 has an asymp- 
totic expansion in powers of 1/n. The first two terms are obtained by 
brute force. 

Appendix A. Examples of direct multiplicity counting. 

This appendix gives the counting of trees and multiplicities of in 
the spectrum for trees on n — 1, 2, 3 or 4 vertices. 

Example 24. For n = 1 there is only one tree, • , and one way to 
label it, giving a total of 1 = I 1 - 2 tree on one vertex. The incidence 
matrix is (0), so the eigenvalue occurs with multiplicity z\ — \. 

Example 25. For n = 2 there is only one tree, — , and one way to 
label it, giving a total of 1 = 2 2 ~ 2 tree on two vertices. The incidence 
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matrix is 




so the eigenvalue occurs with multiplicity z 2 = 0. 

Example 26. For n = 3 there is only one tree, — - , and three 
ways to label it, giving a total of 3 = 3 3-2 trees on three vertices. Up 
to permutation of rows and columns, the incidence matrix for each of 
these three labeled trees is 

(;;;). 

which has zero as an eigenvalue with multiplicity 1 (a corresponding 
eigenvector is *(1, 0, —1)), so there is a total of 3 x 1 zero eigenvalues, 
and z 3 = 3 



(12 ways to label 

(4 ways to label it), giving a total of 12 + 4 = 16 = 4 4-2 



Example 27. For n = 4 there are two trees, 
it), and J 



trees on three vertices. Up to permutation of rows and columns, the 
two incidence matrices are 



/0 
1 






1 



1 





1 



and 



/0 
1 




i 





The first does not have as an eigenvalue, whereas the second has 
zero as an eigenvalue with multiplicity 2 (corresponding eigenvectors 
are for instance '(1,0,-1,0) and '(1,0,0,-1)), so there is a total of 
12 x + 4 x 2 zero eigenvalues, and z 4 = 8. 
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